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Other 


PORTING APPLICATION PROGRAM ONTO SPIRIT-30 


This document describes my experiences of dealing with the SPIRIT-30 board while 
writing an application program. I have an application program, which creates fractal 
images. I have the program written in Microsoft C, and it works with a CGA or an EGA 
graphics adapter. The PC program is called IFC.EXE. I would like to port this program 
to SPIRIT-30 board. With this goal in mind, couple of ounces of some magic potion, 
and a positive attitude towards computers, I set to do the task. All the files referred to 
in this application note are available on the disk named APPLICATION NOTE’. 


To begin with, modify your C:\AUTOEXEC.BAT file to contain the line 

SET SPIRIT DIR = C:\SPIRIT30 
(SPIRIT30 is the base directory for the SPIRIT-30 software. If you choose not to have 
this name, give your own base directory name). Now, put the disk called 
APPLICATION NOTE in your disk drive A: and type A:INSTALL. This will install the 
necessary software in the directory C:\SPIRIT30\APPL, where C:\SPIRIT30 is the 
base directory as specified in the AUTOEXEC.BAT. 


PRELIMINARIES 


It is useful to have the following manuals at hand while writing the applications for the 
SPIRIT-30. 


[1]. The SPIRIT-30 Run Time Library documentation 
in the SPIRIT-30 Technical Reference Manual 
[2]. The TMS320C30 Compiler User’s Guide 
[3]. The TMS320C30 Assembly Language Tools User’s Guide 
é 


Go through these manuals at least once to be qualified as a C-30 application 
programmer. 


Make sure that the installation of the TMS320C30 compiler and the other utilities is 
done right. In particular check to see if the PATH environment variable is updated to 
include the path of the directory containing the tools, and the C_DIR variable is 
appropriately set. | 

STEP 0: 

Pray to your favorite spirit. STEPS 1 through 3 deal with writing the C code which will 
be compiled using the TMS320C30 compiler. STEPS 5 through 7 deal with creating the 
PC side of the software for this application. 

STEP 1: 


I have extracted the function ifc() from file ifc.c, and put it in the file difc.c. The prefix 
’@ is added to indicate that this program would reside on the DSP. 
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STEP 2: 


Later, I added the ’monitor’ program to the difc.c. The result of this operation is in the 
file DIFC.C in the application note disk. This now is the entire source file for the DSP 
resident code. Take printout of this file. The main() does the following. 


1. The DSP loops while checking the semaphor memory location’ wait’ = 0. 


2. If wait==IFC, a call to ifc() is made. When the ifc() is executed, the control comes 
back to main(). The semaphor wait is asserted to 0. This assertion of wait=0 by the 
DSP indicates to the host (PC) that the execution of ifc() is over. The DSP loops again 
while wait == 0. This is one way of writing the monitor program. 


= memory location on the board which needs I/O with the PC must be declared as a 
global variable. In the file DIFC.C, all such variables are declared as globals. 


Read the file DIFC.C, especially the comments in the program, which are self- 
explanatory. | 


STEP 3: 


Compile and link the program difc.c. Use the following commands: 
c30c dife <enter> /* COMPILE difc.c USING THE C-30*/ 
/* COMPILER */ 


Ink30 demo.cmd <enter> /* LINK USING THE MEMORY */ 
/* MAP SPECIFIED IN demo.cmd*/ 


The file demo.cmd specifies that the executable DSP code is in the file difc.out. This 
step completes the creation of the DSP resident code. Read the file difc.cmd. The 
object modules linked along with difc.obj, and their contents are as follows 


vectors.obj: Source file - vectors.asm 
This file contains the code which 
resides at the location OH in the 
memory. This code is executed at reset. 
dspieee.obj: Source file - dspieee.c 
Convert an array of numbers to/from TMS 
from/to IEEE floating point formats. 
convert.obj: Source file - convert.asm 
Convert formats of single numbers. 
Called by dsp30() and ieee30(). 
waitst.obj: | Source file - waitst.c 
Program the TMS320C30 for a given number 
of wait states. At reset, the chip runs 
with a default of 7 wait states. 
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STEP 4: 


Create the PC side software. Let us divide the PC resident software into three modules. 
Module 1 is the highest level module, which contains main() on the PC side. This 
module handles user I/O, sets up the graphics and handles other DSP-independent 
functions. In our example, this module is called drawfern.c. Note that this module is 
really the program ifc.c, but it does not have the function ifc(), because we intend to 
have this function reside on the DSP. The module 1 contains the function calls initifc() 
and xifc(), which form Module 2 and Module 3 respectively. 


STEP 5: 


Create Module 2. This module is in the file initife.c. The Module 2 contains the 
function initifc(), which does the following. 


1. Download the code in the file difc.out to the SPIRIT 
board. 


2. Find the addresses of the various variables in the 
program difc.c residing on the SPIRIT. 


3. Start the DSP program. 


STEP 6: 

Create Module 3. This module is in the file xifc.c. The Module 3 contains the function 
xifc(), which is the driver for the DSP resident function ifc(). The function xifc() does 
the following. 


1. Download the input arguments needed by the DSP 
resident function 1fc() to the SPIRIT board. 


2. Ask the DSP to execute ifc() by changing the memory 
location wait =IFC. 


3. Wait till ifc() execution is over by polling on _wait. 
4. Upload the results of execution of ifc() from SPIRIT 
to PC. 
STEP 7: 
Compile and link the PC software. This is done by using the microsoft MAKE utility. 
The makefile is called FERNMAKE. This file compiles the modules 1,2, and 3 if 


necessary and links them together along with the run time library S3OTOOLS.LIB. The 
file drawfern.exe containing the PC executable code is created. 
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DEVELOPING APPLICATION PROGRAM USING SPIRIT-30 


This document describes the steps I followed in developing an application using 
SPIRIT-30 board. I would like to develop a fast floating point matrix multiplication 
program using SPIRIT-30. All the files referred to in this application note are included 
along with this note. 


PRELIMINARIES 


It is useful to have the following manuals at hand while writing the applications for the 
SPIRIT-30. 


[1]. The SPIRIT-30 Run Time Library documentation in 
the SPIRIT-30 Technical Reference Manual. 
[2]. The TMS320C30 Compiler User’s Guide 
[3]. The TMS320C30 Assembly Language Tools User’s Guide 


In the rest of this document these manuals will be referred to by their numbers. 


I have to create two separate programs, one executing on PC, and one executing on 
SPIRIT-30. STEPS 1-2 deal with developing PC side software. STEPS 3-5 deal with 
developing software which resides on SPIRIT-30 (DSP program). 


STEP 1: 


I start writing the PC side software - "matrix.c". In order to interact with the SPIRIT-30, 
I need to use the SPIRIT-30 Run Time Library routines. So I include "s30tools.h", which 
contains function prototypes of the library routines. I have the basic algorithm: 
Download the DSP program, reset the TMS320C30, download the matrices, signal DSP 
program to start computation, wait for DSP program to complete multiplication, upload 
the resulting matrix and print the results. A simple two way handshaking protocol, isn’t 
it? Given below are the details about how to perform these steps. Have a copy of the 
file "matrix.c" in hand while you go through the detailed description 


(Step 1 of matrix.c) 


First thing you need to do is to download the DSP program. What is so special about 
downloading DSP program compared to downloading data? The DSP program is 
nothing but an executable file, which can be executed by C30. This file is called ">COFF’ 
file (refer [2]) and has several sections - text, data, etc. Each of these needs to be loaded 
at appropriate memory locations on SPIRIT-30. There is a library routine called 
"dsp_dl_exec()" (refer [1]) which will do this for you. 
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(Step 2 of matrix.c) 


Now you have downloaded DSP Bide eae onto SPIRIT-30. But how to start executing 
it? Just reset the C30 using the "dsp reset()" library routine. Resetting C30 will pass 
control to the program pointed by the vector in memory location 0H on SPIRIT-30. The 
DSP ’COFF’ file contains a section called "vectors" with loading address 0, which 
contains the reset vector, and this is downloaded by dsp _dl_exec() routine. 


(Steps 3,4 of matrix.c) 


The DSP program has started executing, and is waiting for the PC side software to 
download the matrices which need to be multiplied. You can download data using the 
library routine "dsp dl long array()" (refer [1]). But the question is, where (on SPIRIT- 
30 memory) to download the data? So the next step is to get the starting addresses of 
the matrices used by DSP program. This is oar tag only if the matrices are declared as 
global variables in DSP program (see STEP 3), in which case the information about 
these variables is given in the "COFF ' file. The dsp dl exec() routine not only loads the 
file onto SPIRIT-30 memory but also prepares a symbol table of global variables. We 
have a library routine called "get laddr()" which, when given the global variable name, 
returns the starting address of this variable. So you need to know the variable names of 
the matrices used by DSP program. Use get_laddr() routine to get the starting address, 
and use this address to download the data. 


(Step 5 of matrix.c) 


Now signal the DSP program to start computation. This is similar to steps described 
above. Get the address of the variable ’flag’, and set the flag by downloading a value 
(say) 1 in that location. 


(Step 6 of matrix.c) 


Wait for DSP program to complete the task. Keep polling the variable flag’ by 
uploading and checking the value. The DSP program must reset the flag after 
completing the task (see STEP 3). 


(Step 7 of matrix.c) 


Get the starting address of the resulting matrix, and upload the values. 


STEP 2: 


In STEP 1 we created the PC side software source file matrix.c. Now we have to create 
the executable file. First choose the memory model, say X (L for large, M for medium, 
S for small). Compile using the command 

cl /c /AX matrix.c 
where X in /AX is the first letter of the memory model. Then 
link using the command 

link matrix,,,s30Xtool 
where X is the first letter of memory model. s30Xtool is the SPIRIT-30 Run Time 
Library. 
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STEP 3: 


My aim is to develop a DSP program which does fast matrix multiplication. So I 
cae DSP program into two modules, one written in ’C’ which has the control 
software, and other written in C30 Assembly, which does the actual computation. Let us 
go through the steps in developing module 1, which I call mult.c. Have a copy of the file 
"mult.c" in hand before going through the details. 


(Step 1 of mult.c) 


In order to achive good performance, the first thing to do is to program C30 to ’zero 
wait state’ for fast memory access. On reset, the C30 comes up with seven wait states, 
which is extremely slow. So we have to specifically program C30 to zero wait states. Use 
waitst() routine of waitst.obj. 


(Step 2 of mult.c) 


Before performing the computation, we need to get the values of the input matrices. In 
this case the values are downloaded by the PC side software. So wait for the PC to 
initialize the matrices and set the flag. Make sure to declare the flag and matrices as 
global variables, so that PC can get the starting addresses of these variables (see STEP 
1) and work with these variables using download/upload library routines. If you declare 
them local, they would be allocated on stack, and PC software hak no way of finding out 
the addresses. Also make sure to initialize the ’flag’ variable to zero (reset). 


(Step 3 of mult.c) 


Now PC has given signal to start computation. Can we go ahead and start 

multiplicaton? NO! Note that we are working with floating point matrices. PC uses 

IEEE format to represent floating point numbers, whereas C30 uses it own format, 

which we call TI format. So before we do anything with the numbers, convert them to 

(r fer Dl) Use "dsp30()" routine from "convert.obj" to do the necessary conversion 
refer [2]). 


(Step 4 of mult.c) | 

Now you are all set to start the computation. This is the critical part of DSP program. 
Use an assembly routine compute() from module 2 to perform a fast multiplication. 
Make sure to pass proper parameters. 

(Step 5 of mult.c) 

We have the resulting matrix now. Can we signal the PC software that computation is 
done? NO! First convert the floating point numbers back to IEEE format using 
"ieee30()" routine from "dspiee.obj". Note that if you are working with integers you 
don’t need to do any conversions, because both PC and C30 use the same 
representation for integers. 

(Step 6 of mult.c) 


Everything is done, so reset the ’flag’ to signal PC software. 
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STEP 4: 


The only thing remaining now is to develop module 2 of DSP program - the fast 
assembly routine "compute()", which performs matrix multiplication. Here are some 
clues for developing a ’C’ callable assembly routine. First make sure to declare the 
function as global. 


Note that the arguments to the function are pushed onto the stack in the reverse order. 
When control comes to your assembly routine, the stack pointer will be pointing to the 
return address. The one just below the return address would be the first parameter. 
Have a look at the file "compute.asm". | 


The code produced by ’C’ compiler makes use of the registers AR3-AR7 and R4-R7. So 
if you need to use these as scratch registers, you must save them before you modify 
them, and restore them before you return. In "compute.asm", AR4 and ARS are being 
used by the routine. 


Refer to the note on "How to speed up Applications using SPIRIT-30" to get tips on 
developing efficient code. 


STEP 5: 


The source files for DSP program are ready now. Now we have to create the C30 
executable "COFF’ file. Use the command 

c30c mult 
to compile "mult.c". Link using the command 

1nk30 mult.cmd 
where mult.cmd is the command file for linking. Make sure to have the object file of 
module 2 (compute.obj) and object files convert.obj (dsp30() routine), dspieee.obj 
(ieee30() routine) and waitst.obj (to set zero wait state) in the command file. Also, 
check to see if ".bss" section is in ROM (SPIRIT-30 external memory) or not. You must 
have a line ".bss: {} > ROM" at the end of the command file mult.cmd. This is to make 
sure that the global variables reside in external memory, rather than internal memory. 


Developing Application Program using SPIRIT-30 - 4 


[RRRARRRRERRRREREERREREERERERRERERERERERRERERERREHEEEHEREERERREREREREREREEEH 1 


/* matrix.c - program which demonstrates how to perform matrix */ 
/* multiplication with the SPIRIT-30 board. */ 
j* */ 
/* Steps involved: */ 
/* 1. Download the C30 executable file "mult.out" which performs all f 
{* matrix multiplication. See mult.c, compute.asm files. */ 
f* 2. Reset C30 to start execution of the file loaded in step 1. *] 
i* 3. Get the addresses of global variables (addresses of af) (],b{][),*/ 
j* cf](], and flag) used in "mult.out". These are the SPIRIT-30 */ 
/* external memory addresses. */ 
{* 4. Download the values of matrices af][] and b[][] using the */ 
/* addresses obtained from step 3. "/ 
/* 5. Set the flag (using the address obtained from step 3) to inform */ 
/* DSP program (mult.out) that matrix multiplication can be started*/ 
f* 6. Wait for the flag to be reset by "mult.out", which indicates */ 
{* that the product has been computed - result is stored in c{][]. */ 
/* 7. Upload the resulting matrix c[]{] using the address obtained in */ 
/* step 3. Print the results. */ 
/* mi 
/* Files: */ 
fe The PC source file is matrix.c */ 
{* The PC executable file matrix.exe al 6 
/* The C30 source files are ‘mult.c’ and ‘compute.asm’ */ 
7* The C30 executable file is ‘mult.out’ */ 
/* See ‘mult.c’ file for instructions to create ‘mult.out’ sal! § 
/* */ 
/* To compile this program with Microsoft’s C 5.1 compiler: */ 
ra cl /c matrix.c */ 
/* Link matrix,,,Ss30stool */ 


[ERERERREEKHEKREREERAREEREREEEREREREERARERARREREREEEERAEERAUEREREEEREERE AER / 


#include "s30tools.h" 


float a[4] [3] =¢ /* matrix afl] */ 
{ 1.0, 2.0, 3.0), 
{-1.0, -2.0, -3.0), 
{ 5.0, 6.0, 7.0}, 
{-5.0, -6.0, -7.0} 


; 

float b[3] (4) =¢ /* matrix bf] */ 
(1.0, -1.0, 2.0, -2.03, 
{ 1.0, 1.0, 1.0, 1.0), 
{ 0.0, 0.0, 0.0, 0.03 

3; 

float c[4] [4]; /* matrix c(){) */ 


/* The following are set equal to the corresponding absolute address of the */ 
/* label in the DSP program by calling ‘get_laddr()’ with the label’s name. */ 


long a_addr,b_addr,c_addr; 
long flag_addr; 
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main() 


/* STEP 


/* STEP 


/* STEP 


1 */ 
/* download DSP executable file ‘mult.out’ which performs matrix */ 
/* multiplication ¥/ 


if (dsp_di_exec("mult.out") == -1) ¢ 
printf("Error in downloading the file\n"); 


exit (1); 
> 
2 */ 
dsp_reset(); /* RESET C30 to give control to mult.out */ 
5% / 
a_addr = get_laddr('_a"); /* Find SPIRIT-30 address of af] [] */ 
b_addr = get_laddr("_b"); /* Find SPIRIT-30 address of b[}[] */ 
c_addr = get_laddr("_c"); /* Find SPIRIT-30 address of c[][] */ 


flag_addr = get_laddr(" flag"); /* Find SPIRIT-30 address of flag */ 


/* The above addresses are obtained from the symbol table prepared */ 


/* by dsp_dl_exec() library routine while loading "mult.out" ¥/ 
matrix(a,b,c); /* Procedure to calculate product */ 
mat_print(a,b,c); /* Print the results ad | 
exit(0); 


[REARAREREREHEREREEERERRRRERREREREAHHRERREREEREHERERERRERERRERERRERRREEREKREE | 


/* Procedure to compute to product of two matrices. c = a*b */ 
[REARERREREREEREREEREEREREERERREREAREREEEEAREEEREREERREREEEKEREREHEREREREE | 


matrix(a,b,c) 
float a4) [3]; 
float b{3) [4]; 
float c[4] [4]; 


{ 


int status; 


/* Note: a 4x3 matrix consists of 12 floating pt. numbers */ 


/* STEP 4 */ 
dsp_di_long_array(a_addr,12,(long *)a); /* Download af] [] matrix */ 
dsp_dl_long_array(b_addr,12,(long *)b); /* Download b{]{] matrix */ 
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/* A ‘1° is downloaded to SPIRIT-30 at address ‘flag’ which signals the C30 */ 
/* that the data is ready and to perform a 4x4 product. When the C30 is "/ 
/* done, it sets ‘flag’ = 0 to signal the PC that the data is ready. */ 


status = 1; 
printf("Waiting for DSP program to compute .. \n"'); 


/* STEP 5 */ 
dsp_dl_int_array(flag addr,1,&status); /* set the flag */ 
/* STEP 6 */ 
while (1) ¢ 
dsp_up_int_array(flag_addr,1,&status); /* read the flag */ 
if (status == 0) break; /* check if it 1s reset */ 
1f (kbhit()) exit (0); 
} 
/* STEP 7 */ 


dsp_up_long_array(c_addr,16, (long *)c); /* Upload c{][] matrix */ 


[BARRERA RERRERREREREUREUERERERREERRREREEHERERERERERERERENERERERERRERREREEEEEE / 


/* print the results */ 
[BRAKE RERREREREREREREREKERKERKEKREEEEREREEKREUEREEREREREREREEREEREREREHERRERERRE / 


mat_print(a,b,c) 
float af{4] [3]; 
float b[3] [4]; 
float c[4] [4]; 


{ 
register i,j; 
printf¢'"\n\n"); /* Print results */ 
for (i=0; i<4; ++1) 
{ 
for (j=0; j<3; ++j) 
printf("%10.4f", ali] (j)); 
printf("\n"'); 
} 
printf("\ntimes\n"); 
for (i=0; i<3; ++i) 
{ 
for (j=0; j<ez ++j) 
printf("%10.4f", bCi) (J); 
printf¢("\n"'); 
) 
printf("\nequals\n")> 
for (i=0; i<4; ++j) 
{ 
for (j=0; j<4; ++j) 
printf("%10.4f",cliJ (j]); 
printf("\n"); 
> 
> 
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[BEARER REREREEEEREREREERREERERERERERKEERERERERERRERERERREEEREREREREREREKEKRERE / 


/* mult.c : C30 program which computes the product of two matrices af 
Yh */ 
/* Steps involved: */ 
/* 1. Program C30 to ‘zero wait state’ for fast memory access (default */ 
7° is seven wait states). */ 
/* 2. Wait for the flag to be set by PC program (matrix.exe). PC pgm */ 
/* sets this flag after initializing the matrices af{])(J and b{](] */ 
/* 3. Convert the floating point values from IEEE format to TI format. */ 
/* (PC uses IEEE format whereas C30 used TI format) */ 
/* 4. Perform the matrix multiplication of af)[] & b{][], and place “/ 
/* the result in c{][]. (Use assembly routine for high performance) */ 
{* 5. The resulting matrix c[][] values are in TI format. Convert */ 
/* them into IEEE format. */ 
/* 6. Reset the flag to inform PC program that results are ready / 
/* */ 
/* Files: : bal 
/* The PC source file is matrix.c */ 
/* The PC executable file is matrix.exe */ 
7* The C30 source files are "“mult.c" (this file) & "compute.asm" */ 
/* The C30 executable file is "mult.out". it 
[* */ 
/* To create mult.out using TMS320C30 compiler - ¥/ 
j* c30c mult ('e!’ file) ot 
/* asm30 compute (C30 assembly file) */ 
/* Lnk30 mult.cmd (See mult.cmd file for details) */ 
[RRERERAREREREEREEREEKEEEERREREREKEEEREREREREEKERREEREREERHEREREREKEREREERKEEEE / 
float a4] [3]; /* matrix a(j() */ 
float b(3] [4]; /* matrix bi} {] */ 
float c{4] [4]; /* matrix cf) (] */ 
int flag = 0; /* flag - initialize to zero (reset) */ 
main() 
{ 
/* STEP 1 */ 
waitst(); /* program C30 to zero wait state */ 
/* STEP 2 */ 
while (flag != 1); /* wait for the flag to be set by PC program - this */ 
/* flag is set by PC program after initializing */ 
/* the matrices a(][] and b{)[) ad’ 
/* STEP 3 */ 
dsp30(a,12); /* convert the floating point values of a[]{] and */ 
dsp30(b, 12); /* bI](] from IEEE format to TI format */ 
/* STEP 4 */ 
compute(a,b,c,4,3,4); /* call assembly routine which computes the */ 
/* product of a{)[] & b[][], and puts the */ 
/* results in cf) (J */ 
/* STEP 5 */ 
1eee30(c, 16); /* convert the floating point values of cf) [] "7 
/* from TI format to IEEE format a 
/* STEP 6 */ 
flag = 0; /* reset the flag to inform PC program that the */ 
/* results are ready */ 
> 
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SEPAAR ASAE AAAR SANSA AEWA AE LAAN ARES AA ES AAS SAAS SRO Sree TEN eee eee eee eee eee ee 


3 compute.asm: 

: Synopsis: 

: compute(a,b,c,m,n,p) 

float *a,*b,*c: pointers to two dimensional matrices a(][(],b{) (],c{) () 
Fe int m, nN, p: dimensions of the matrices .. 

: af]lf{] ismxX on, bt] is nX p, clll] mx p 


: Description: 
: This function multiplies matrices a{]{) and b{][], and puts the result 
; in matrix cf) [(] 


3; Stack structure after the call: 


; -fp(7) | p | 
: -fp(6) | n | 
: -fp(5)—s | m | 
. -fp(4) | c | 
: -fp(3)_—s | b | 
; -fp(2) | a | 
: -fp(1) | ret addr | 
: -fp(0)-> | old fp | 
; [erence | 
; Registers modified: 

: rO, ri, r2, arO, ari, ar2, rs, re, re 


s Time taken (# cycles): 
: 7m + 19mp + 2mnp (2 for mpyf || addf instr latency) 


MHRA KREEKREEKREKREEEEEREREREREREREEKEKEEEHREKKEMERERERREEEERERKRKEKREREKK 
a 


fp -set ar3 
-global _compute 
.data 
-text 
_compute 
push fp ; save old fp 
di | sp, fp ; point to the top of the stack 
push ar4 s save ar4 
push ar5 : save ar5 


di *-fp(2),ar0 ; ar0 = a (&a[0) [0] ) 
di *-fp(3),ar1 ; ari = b (&b(0) (0}) 
di *-fp(4),ar2 s ar2 = ¢ (&c[0] [0] ) 
di *-fp(5),ar4 ; ar4 =m 


subi 1,ar4 ; arS = m-1 
di *-fp(7),ar5 ; ar5 =p 
idi ar5,ir0 ; ir0 =p 
subi 1,ar5 3 ar5 = p-1 
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e 
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e 
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e 
s 


e 
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for (i 


= 0; i <= m-1; i++) 
for (j = 0; j <= p-1; j++) 


{FOR LOOP 1] 
[FOR LOOP 2] 


Start of the loop: ar0 = &a[i}) [0] and ari = &b{0] [j) 
Loop 


ldi ari,ri * ri = &b(0) [j) 

x = 0.0 

for (k = 0; k <= n-1; k++) {FOR LOOP 3) 

x += afi} (k] * bk] Cj); 

cLiJ(j) = x; 

(di *-fp(6),rce srce=zn 

subi 1,rc sre =n-1 

ldf 0.0,r0 ; rO = 0.0 

ldf 0.0,r2 ; r2 = 0.0 

rpts re ; repeat ‘n’ times 
moy f *ar0++,*ari++(ir0),r0 ; | After this instruction : 
addf r0,r2 : | arO = &afi+1) (0), 
addf r0,r2 . | ari = &b{n) Cj] 


end of [FOR LOOP 3) 


stf r2,*ar2++ 
dbd ar5, loop 
subi *-£9(6),ar0 
tdi ri,ari 

addi 1,ar1 


end of [FOR LOOP 2] 


cCil({j) =x; 


check index [FOR LOOP 2] 

arO = &a{i] [0] (reposition to i th row) 

ari = &b(0][j] Creposition to j th column) 
arts &b(0] [j+1] (reposition to j+1 th column) 


i th row of c{][] has been computed .. so move on to next row 


addi *-fp(6),ar0 
dbd ar4, loop 
di *-fp(7),ars 
subi 1,ar5 

di *-fp(3),ar1 


s end of [FOR LOOP 1] 


® 
8 


pop ar5 
pop ar4 
pop fp 
retsu 
.end 


= 


8 


ar0 = &af[it+1}) [0] (position to next row) 
check index [FOR LOOP 1] 

ar5 = p (index [FOR LOOP 2]) 

ar5 = p-1 

ari = &b(0) (0) 


- restore ar5 
* restore ar4 


pop the old frame pointer 


; return to the calling program 


compute.asm - 2 


[REAERRHERHERRERRERERERERERRRREREEREEERHRKERERREREREERERERERKEREREKEKEEREREREREE / 


/* MULT.CMD - vi.10 COMMAND FILE FOR LINKING C30 C PROGRAMS */ 
hs */ 
/* Usage: ink30 <obj files...> -o <out file> -m <map file> c.cmd */ 
/* a 2 
{* Description: This file is a sample command file that can be used */ 
/* for linking programs built with the TMS320C30 C */ 
/* Compiler. Use it a guideline; you may want to change */ 
/* the allocation scheme according to the size of your */ 
/* program and the memory layout of your target system. */ 
/* */ 
/* Notes: (1) You must specify the directory in which rts.lib is */ 
f* located. Either add a "-i<directory>" line to this */ 
/* file, or use the system environment variable C_DIR to */ 
/* specify a search path for libraries. “/ 
/* */ 
i (2) When using the small (default) memory model, be sure =f 
j* that the ENTIRE .bss section fits within a single page. */ 
’ bias To satisfy this, .bss must be smaller than 64K words and */ 
/* must not cross any 64K boundaries. */ 
[RERRRRRRARREEEERREREREREERERRERERERERRERERREREREREKEREREREERREREREEEEREERERERE / 
mult.obj 

compute.ob/j 

vectors.obj /* TAKES CARE OF JUMP TO c_int00 */ 

convert .obj /* YEEE TO DSP CONVERSIONS */ 

dspieee.obj /* CONVERTS ARRAYS OF NUMBERS DSP_IEEE */ 

waitst.obj /* PROGRAMS THE CPU FOR DESIRED WAITSTATES */ 

7c /* LINK USING C CONVENTIONS */ 
-o mult.out 

-m mult.map 

-irts.lib /* GET RUN-TIME SUPPORT */ 


/* SPECIFY THE SYSTEM MEMORY MAP */ 


MEMORY 
{ 
VECS: org = 0 len = OxcO 
ROM: org = 0x400 Len = 0x7b00 
RAMO: org = 0x809800 len = 0x400 /* RAM BLOCK 0 */ 


0x809c00 len = 0x400 /* RAM BLOCK 1, PLUS 4K OF EXT */ 


RAM1: org 


/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */ 
/* A SIMPLE MINDED MAP WHICH PUTS EVERYTHING IN THE EXTERNAL MEMORY */ 
SECTIONS 


{ 
etext: {} > ROM /* CODE */ 
-cinits {} > ROM /* INITIALIZATION TABLES */ 
«Stack: {} > ROM /* SYSTEM STACK tf 
-data: {} > ROM /* DATA */ 
-bss: {} > ROM /* GLOBAL & STATIC VARS (SEE NOTE 2) az A 
-sysmem: {} > ROM /* DYNAMIC MEMORY - DELETE IF NOT USED */ 

> 
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HOW TO SPEED UP APPLICATIONS USING SPIRIT-30 AND TMS320C30 


This document gives step by step procedure on how I speeded up an a ig 
executing on SPIRIT-30. I have a sample ’C’ program. My goal is to reduce the 
execution time of this program on C30 as much as possible. All the text files (’.c’,’.asm’ 
and ’.cmd’) referred to in this document are enclosed in this packet. This document 
expects the reader to have good knowledge about the TMS320C30 processor, Compiler, 


and Assembler. 


PRELIMINARIES 


It is useful to have the following manuals at hand while going through this document. 
[1]. The TMS320C30 Compiler User’s Guide 
[2]. The TMS320C30 Assembly Language Tools User’s Guide 
[3]. The TMS320C30 User’s Guide 


In the rest of the document, these manuals are referred to by their numbers. 
STEP 1: 


First, I wrote down a sample program "dsp1.c" with some integer and floating point 
multiplications. Note that there is an outer ’for’ loop which runs 10000 times. I have this 
So as to get the execution time in the order of seconds, so that the error percentage is 
less. 


I compiled it using the command "c30c dsp1" and linked using the command "Ink30 
dsp1.cmd", ’dsp1.cmd’ being the command file. This program took 47.79 seconds. 


From next step onwards, for step ’i’ I plan to create ’dspi.c’ by modifying ’dspi-J.c, and 
’dspi.cmd’ by modifying ’dspi-7.cmd’ (if necessary). 


STEP 2: 


How to speed up this application? The first thing which stuck me is that, on reset the 
C30 processor by default assumes a ’seven wait state’ external memory. The SPIRIT-30 
board supports high speed on board memory, which can be used with zero wait states. 
So by using memory in zero wait states, I expect to get about 7 times speed up. 


But how to program C30 to zero wait states? Simple! Use the ’waitst()’ routine of 
’waitst.oby’ fneladed in Application Diskette). I made two changes: (1) Inserted a call 
to waitst( in the very beginning of the dsp1.c -> dsp2.c (2) Modified ’dsp1.cmd’ to 
include ’waitst.obj’ during linking -> ’dsp2.cmd’ 


This program took 7.08 seconds, almost 7 times improvement, as expected. 
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STEP 3: 


Now I am using the fastest memory. So can there be further scope for improvement? I 
am using the fastest possible on board memory, but it is still slower than the on-chip 
memory, which is internal to the C30 processor. Since there are a lot of data accesses in 
this program, I could do better if I use internal memory to store data. 


Note that the variables declared in this program are local, and hence the space is 
allocated on the stack. If you look at the ’dsp2.cmd?’ file, the stack is placed in external 
memory - observe the SECTIONS’ part at the end of the file. The line "stack: {} > 
ROM" (ROM means external memory) makes the linker place the stack in ROM. 
Modify this line to "stack: {} > RAMO" (RAMO means internal memory Bank-0) to 
have stack in internal memory -> dsp3.cmd. ’dsp3.c’ is same as ’dsp2.c’. 


This program took 5.22 seconds, a significant improvement! 


STEP 4: 


Is there anything faster than internal memory, for data accesses? CPU Registers, 
ofcourse! So next step is to use register variables. Unfortunately, TMSC30 complier 
allows only two integer and two floating point register variables. So we have to carefully 
choose which of the variables deserve to be placed in registers. To be specific, we must 
choose the most heavily used variables. ’dsp4.c’ is same as ’dsp3.c’ with some register 
variables. ’dsp4.cmd’ is same as ’dsp3.cmd’. 


This program took 5.00 seconds, not much improvement. Why? Because there are still 
quite a few memory variables, and also, the CPU registers are not drastically faster than 
internal memory. 


STEP 5: 


In all the above steps we have not looked at the actual code. Any ’C’ compiler has its 
own limitations, and it cannot produce 100% optimized code. Also, C30 processor 
supports repeat and parallel instructions, which are not exploited by the ’C’ compiler. 
So in this step we can try to use optimized assembly code in critical regions. 


Step 5.1: 


Have a look at ’dsp4.c’. There are three ’for loops’, which are numbered. At this stage 
you must know how TMSC30 compiler generates code. After preprocessing, it converts 
the ’.c’ file into the equivalent assembly file (’.asm’), which in turn is assembled by the 
TMSC30 assembler, and finally linked by the TMS linker. Have a look at ’dsp4.asm’ 
file, which is the file generated by the complier. Block 1 is the code for the ’for loo 

(1)’. Observe that it takes more than 1200 cycles to execute this part of the code. How 
to optimize this? 
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First of all, note that the ‘for loop’ resembles a simple ’do loop’ with one index running 
from 0 to 99. So we can use a repeat (RPTS/RPTB) instruction. Also observe that the 
array indices increment by constant amount every iteration of the loop. Both the 
instructions can be transformed to two assembly ’STT instructions, which can be 
executed using a ’parallel STT instruction. Look at the optimized code for Block 1 in 
the file 'dsp5.asm’. It takes only about 100 cycles! I modified dsp4.asm by replacing the 
‘for loop 1’ with equivalent optimized assembly code. ’dsp5.cmd’ is same as “dsp4.cmd’. 


This program took 4.12 seconds, expected improvement. 

Step 5.2: 

I did a similar optimization for for loop 2’, and the program executed in 3.08 seconds. 
Step 5.3: 


The last step is to optimize ’for loop 3’, which is very tricky. I took advantage of the fact 
that the first two instructions of the loop are independent of the last two instructions. So 
I can split up the loop into two for loops. Also the first two instructions can be 
combined into a ’parallel MPYI || ADDI’ instruction, and the last two can be 
combined into a ’parallel MPYF || ADDF’ instruction. As a result, Block 3 can be 
optimized using two RTPS parallel instructions. 


Looks as though everything is done, but there is a small problem! Once you convert ’for 
loop 3’ also into assembly, there is absolutely no ’C’ code within the outer ’for loop’ 
(which runs 10000 times). The compiler doesn’t care what you give within asm(") 
instruction. It assumes that it is proper assembly code, and dumps the given instructions 
in the proper place. It is the job of assembler to detect errors in this case. Have a closer 
look at ’dsp5.asm’. The ’C’ compiler uses the register RO as a scratch register to store 
index for the outer for loop. As far as the complier is concerned, there is no body for 
the loop, so it assumes that RO remains unchanged. Unfortunately we are corrupting RO 
in the optimized assembly code. I got over this problem by adding one more assembly 
oa. at the end of the loop, which restores the correct value of the index (see 
sp5.c). 


Now everything is set, and I executed the program. The outcome is unbelievable - 0.28 
seconds! For a moment it looks suspicious, but if you take a closer look at the 
’dspS.asm’ file, you will know the truth. Now, most of the program is in assembly, and 
bulk of the time is taken by the four repeat instructions, which come to about 400 
cycles. Adding the remaining instructions, the body of the outer ’for loop’ must take 
about 450 cycles, which at 60ns/cycle comes to 27 micro seconds. Considering the outer 
loop which runs 10000 times, the total time is 0.27 seconds, which 1s the close to the 
observed value. 


CONCLUSION 


So, from step 1 to step 5 we gota dee up of about 170 times! Here is the summary of 
the steps taken to speed up an application using SPIRIT-30: (Note that you cannot 
expect this much speed up for all the programs. It depends on how much of the code 
you can — using assembly, and how much of the data you can have in internal 
memory). 
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SUMMARY: HOW TO SPEED UP APPLICATIONS 


Step 1: First write the program. 

Step 2: Program C30 to zero wait States. 

Step 3: Place data in internal memory. 

Step 4: Have heavily used variables in registers. 
Step 5: Program the critical sections in assembly. 


To optimize assembly code - 

* Make use of repeat (RPTS/RPTB) instructions. 

* Try to use Parallel instructions. 

* Use delayed branches. 

* Avoid register conflicts for latency- you may 
have to rearrange some instructions (refer to 
’section 10.2: Pipeline conflicts’ of [3]). 

* Avoid memory conflicts - distribute data in 
different memories (refer to section 10.3: 
Memory conflicts’ of [3], refer to Appendix 
of this document). 


APPENDIX 


EXAMPLE: SPEEDING UP APPLICATION BY REMOVING MEMORY 
CONFLICTS 


This appendix explains how I speeded up matrix multiplication program by removing 
memory conflicts. The matrix multiplication program is discussed in the document on 
"Developing Application Program using SPIRIT-30", and you must go through that 
document before reading this section. 


Have a look at ’compute.asm’ file. In the beginning of first page, an approximate timing 
equation is given in terms of m,n,p, the parameters to the function. For large m,n,p, the 
time is proportional to ’mnp’. This is due to the inner most loop, coded using RPTS 
instruction. Observe that there is a constant factor of ’2’ to this product. Have a closer 
look at the RPTS instruction - there are accesses to two different memory locations 
(pointed by ARO, AR1) in external memory, and hence there is memory conflict! An 
extra cycle is needed to fetch the data. If we can avoid this conflict, we can remove this 
factor, and speed up the computation 2 times. 
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How do avoid the memory conflict? A good solution is to place the data in internal 
memory (refer to section 10.3 of [3]). This means we must have the matrices in internal 
memory. In STEP 3 of this document we discussed how to place data in internal 
memory by declaring local variables and having stack in internal memory. It works fine 
for the example discussed (dsp1.c), since there is no interaction between PC and 
SPIRIT-30. But in matrix multiplication, data is initialized by the PC, and so the 
matrices must be declared as global variables (they must be in external memory). 


How to get over this problem? The only solution seems to be to replicate data. So we 
retain the matrices as global variables, but declare exactly same set of variables (with 
different names, ofcourse) local, which can be placed in internal memory. Before 

calling the compute() function, copy the data from external memory to internal memory 
and call the function with local variables. When you return back, you have to copy the ~ 
resulting matrix from internal to external memory so that PC can read back the results. 
Have a look at ’fmult.c’, which is the modified version of mult.c. Also, ’fmult.cmd’ has 
stack placed in RAMO. The new PC side software is ’fmatrix.c’, which is exactly same as 
matrix.c, except that it downloads ’fmult.out’ (modified DSP software) instead of 
"mult.out’. 


There are limitations to this method. There are only 2K words of internal memory, and 


if data exceeds 2K, you have to follow different strategy to distribute data, for e.g., using 
memory on peripheral bus (refer to section 10.3 of Bh. 
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[REARRRREERRERRAREREREREREREERERERERERERRERREEREREEREEEEERERER / 


/* File: dspl.c ad 


[RRRRRRRREREREEERRERERRERERREREEREREREREREERERERREREREREREREEE / 


main() 
{ 
int i; 
int index; 
int intl, int2, int3; 
int array1(100] ,array2(100]; 
float f1, f2; 
float farray1(100), farray2{(100); 


for (1=0;1<10000;i1++) ¢ 


for (index=0; index<100; index++) ¢ 
array1[index] = 10; 
array2[index] = -12; 

> 

for (index=0; index<100; index++) ¢€ 
farrayi{index] = 1.0; 
farray2[index] = 2.0; 


> 

f2 = 0.0; 

int2 = -1; 
inti = 0; 

int3 = 0; 


for (index=0; index<100; index++) ¢ 
intl += int3; 
int3 = arrayl[index] * array2{index]; 
f1 = farrayl[index) * farray2{index]; 
f2 += f1; 


dspl.c 


[RARARERERREREERRRERRERERERREREREEREREERERRERERRERRRERERRERRERERRKRRRRERERERE / 


*/ 


*/ 


i 


/* DSP1.CMD - v1.10 COMMAND FILE FOR LINKING C30 C PROGRAMS 

/* 

/* Usage: tnk30 <obj files...> -o <out file> -m <map file> c.cmd 
/* 

/* Description: This file is a sample command file that can be used 

7* for linking programs built with the TMS320C30 C 

/* Compiler. Use it a guideline; you may want to change 
/* the allocation scheme according to the size of your 

j* program and the memory layout of your target system. 

/* 

/* Notes: (1) You must specify the directory in which rts.lib is 

/* located. Either add a "-i<directory>" line to this 

/* file, or use the system environment variable C_DIR to 
/* specify a search path for libraries. 

/* 

Vaal (2) When using the small (default) memory model, be sure 

/* that the ENTIRE .bss section fits within a single page. 
7* To satisfy this, .bss must be smaller than 64K words and 
f° must not cross any 64K boundaries. 
[RERRRRERRRRRRRERERERERERERRERREEEEREREKERERERRERARRERERERERERERRERURREREREREE / 
dsp1.obj 

vectors.obj /* TAKES CARE OF JUMP TO _c_int00 */ 

-c /* LINK USING C CONVENTIONS 

-o dsp1.out 

-m dsp1.map 

-Irts.lib /* GET RUN-TIME SUPPORT 


/* SPECIFY THE SYSTEM MEMORY MAP */ 


MEMORY 
{ 

VECS: org = 0 len = OxcO 

ROM: org = 0x400 len = 0x7b00 

RAMO: org = 0x809800 len = 0x400 /* RAM BLOCK O 

RAM: org = 0x809c00 len = 0x400 /* RAM BLOCK 1, PLUS 4K OF EXT 
> 


/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */ 
/* A SIMPLE MINDED MAP WHICH PUTS EVERYTHING IN THE EXTERNAL MEMORY */ 
SECTIONS 


{ 

etext: {} > ROM /* CODE 

-cinits {} > ROM /* INITIALIZATION TABLES 

«Stack: {} > ROM /* SYSTEM STACK 

data: {} > ROM /* DATA 

-bss: {} > ROM /* GLOBAL & STATIC VARS (SEE NOTE 2) 

-sysmem: {} > ROM /* DYNAMIC MEMORY - DELETE IF NOT USED 
) 


dsp1i.cmd 


ad 


S/ 
dd 


*/ 
*/ 
a 
“/ 
“/ 
aif 


[AREREREREREREREREREEREREEERERRERERRERERRERERUEREEREREREEREEEEE / 


/* Files dsp2.c */ 


[BRRRARERRERREERERREREERERRRRERRERERERERURREREREEEEEEEREEEEEEE 7 


main() 
{ 
int i; 
int index; 
int inti, int2, int3; 
int array1(100] ,array2[100]; 
float fi, f2; 
float farray1[100], farray2 [100]; 


for (i1=0;1<10000;i++) ¢ 


for Cindex=0; index<100; index++) ¢ 
arraylCindex] = 10; 
arrayed [index] -12; 


} 

for Cindex=0; index<100; index++) ¢ 
farray1[lindex] = 1.0; 
farray2[index] = 2.0; 


> 

f2 = 0.0; 

int2 = -1; 
int! = 0; 

int3 = 0; 


for Cindex=0; index<100; index++) { 
inti += int3; 
int3 = arrayllindex) * array2 [index]; 
f1 = farrayl [index] * farray2 [index]; 
f2 += f1; 


dsp2.c 


[RARARARARRERREREREREEREREREHEREHREEERRERERRERRREREREEEERERRERRRRUHRRRKRERREEE 


/* DSP2.CMD - v1.10 COMMAND FILE FOR LINKING C30 C PROGRAMS 
/* 
/* Usage: lnk30 <obj files...> -o <out file> -m <map file> c.cmd 
/* 
/* Description: This file is a sample command file that can be used 
/* for linking programs built with the TMS320C30 C 
/* Compiler. Use it a guideline; you may want to change 
/* the allocation scheme according to the size of your 
/* program and the memory layout of your target system. 
/* 
/* Notes: (1) You must specify the directory in which rts.lib is 
/* located. Either add a "-i<directory>" line to this 
/* file, or use the system environment variable C_DIR to 
/* specify a search path for libraries. 
/* 
[* (2) When using the small (default) memory model, be sure 
/* that the ENTIRE .bss section fits within a single page. 
/* To satisfy this, .bss must be smaller than 64K words and 
/* must not cross any 64K boundaries. 
[RRRRERRERRERKEREREEERREREREREREERKEEKRERERREREREEKEREREREREEEKREREREERREEKKRE 
dsp2.obj 
vectors.obj /* TAKES CARE OF JUMP TO _c_int00 */ 
A PROGRAMS THE CPU FOR DESIRED WAITSTATES */ 
=C /* LINK USING C CONVENTIONS 
-o dsp2.out 
-m dsp2.map 
-irts.lib /* GET RUN-TIME SUPPORT 
/* SPECIFY THE SYSTEM MEMORY MAP */ 
MEMORY 
{ 
VECS: org = 0 len = OxcO 
ROM: org = 0x400 len = 0x7b00 
RAMO: org = 0x809800 len = 0x400 /* RAM BLOCK 0 
RAM1: org = 0x809c00 len = 0x400 /* RAM BLOCK 1, PLUS 4K OF EXT 
> 
/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */ 
/* A SIMPLE MINDED MAP WHICH PUTS EVERYTHING IN THE EXTERNAL MEMORY */ 
SECTIONS 
{ 
-text:  {} > ROM /* CODE 
-cinit: {} > ROM /* INITIALIZATION TABLES 
«stack: {} > ROM /* SYSTEM STACK 
-data: {€} > ROM /* DATA 
-bss: {} > ROM /* GLOBAL & STATIC VARS (SEE NOTE 2) 
-sysmem: {} > ROM /* DYNAMIC MEMORY - DELETE IF NOT USED 


dsp2.cmd 


a 


of 


we 


=] 


ad 


*/ 
*/ 


*/ 
*/ 
*/ 
*/ 
=f 
df 


[AERRREHAERERRERERERERERREEHRERERERERRRERREEEREERERERARERREEEEARREEHERRERKEEE / 


/* DSP3.CMD - v1.10 COMMAND FILE FOR LINKING C30 C PROGRAMS */ 
7* bal 
/* Usage: \nk30 <obj files...> -o <out file> -m <map file> c.cmd */ 
/* */ 
Vhs Description: This file is a sample command file that can be used */ 
/* for linking programs built with the TMS320C30 C a/ 
/* Compiler. Use it a guideline; you may want to change */ 
/* the allocation scheme according to the size of your */ 
/* program and the memory layout of your target system. */ 
/* */ 
y* Notes: (1) You must specify the directory in which rts.lib is */ 
/* located. Either add a "-i<directory>" line to this sad! 
/* file, or use the system environment variable C_DIR to */ 
/* specify a search path for libraries. sd J 
yj */ 
/* (2) When using the small (default) memory model, be sure sat J 
/* that the ENTIRE .bss section fits within a single page. */ 
;* To satisfy this, .bss must be smaller than 64K words and */ 
/* must not cross any 64K boundaries. lf 
LEASE AAEESEAAAE SAA AEE A ASSES EAE SSS ASLAN AL ERE SAS SEER SAAR AAAS RARR EAA RR SR EE / 
dsp3.obj 
vectors.obj /* TAKES CARE OF JUMP TO c_int00O */ 
waitst.obj /* PROGRAMS THE CPU FOR DESIRED WAITSTATES */ 
-C /* LINK USING C CONVENTIONS ial 4 
-o dsp3.out 
-m dsp3.map 
-Irts.lib /* GET RUN-TIME SUPPORT */ 
/* SPECIFY THE SYSTEM MEMORY MAP */ 
MEMORY 
{€ 
VECS: org = 0 len = OxcO 
ROM: org = 0x400 len = 0x7b00 
RAMO: org = 0x809800 len = 0x400 /* RAM BLOCK 0 */ 
RAM1: org = 0x809c00 len = 0x400 -/* RAM BLOCK 1, PLUS 4K OF EXT */ 
> 
/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */ 
/* A SIMPLE MINDED MAP WHICH PUTS EVERYTHING IN THE EXTERNAL MEMORY */ 
SECTIONS 
{ 
etext: {} > ROM /* CODE . */ 
-cinit: {} > ROM /* INITIALIZATION TABLES */ 
/* SYSTEM STACK */ 
-data: KO} /* DATA */ 
.bss: {} > ROM /* GLOBAL & STATIC VARS (SEE NOTE 2) */ 
-sysmem: {} > ROM /* DYNAMIC MEMORY - DELETE IF NOT USED */ 
) 


dsp3.cmd 


[RARARRRERRERERRRERERREREREERERREERERRERREREERERRERERERKRUREEE / 


/ 


* File: dsp4.c 


lf 


[RERREERARRERREREERREREAREREERREERERERERERERERRRRKERERERERERERE / 


main() 


€ 


int i; 
int index; 


waitst(0); 


for €1=0;1<10000; i++) ¢ 


for Cindex=0; index<100; index++) ¢ (4) 
array1l[index] = 10; 
array2[index] = -12; 

} 


for (index=0; index<100; index++) ¢ ) 
farray1 [index] = 1.0; (4 


farray2 [index] = 2.0; 


> 

f2 = 0.0; 

int2 = -1; 

intl = 0; : 
int3 = 0; 


for (index=0; index<100; index++) ¢ 


intl += int3; 3) 
int3 = arrayl{index) * array2 [index]; ( 
fi = farrayl[Lindex] * farray2 [index]; 
f2 += f1; 
) 
} 
> 


dsp4.c 


RRAEAKRARREKKERERREREREKERERRRKEKERHKERRERRREEERRERERRREERE 


TMS320C30 C COMPILER 


Version 1.10 


REARKEKRRKEKERRERERREREREEREKRKRERERERERERREREREREREREKEER 


FP set 


.sect 
-word 
-word 
-globl 
.bss 
-globl 
-file 
-text 


AR3 


".cinit" 
1,_flag+0 
2 

_flag 
_flag,1 
_main 
"dsp4.c" 


RAERKEEKKEKRKKKKAKKKERKKRKRRKKKKKKKKKKKKKKKKKKKKKKKKKKkKKkkKh 


* FUNCTION DEF : _main 


RHERKEREKEKERKKKKEKKKKRKRKKKKRKKKKKKKKKKKKKKKKkKkKKKKKKKkKKKKKK 


_main: 

PUSH FP 

LDI SP, FP 

ADDI 403,SP 

PUSH R4 

PUSH R5 

PUSHF R6 

PUSHF R7 

LDI 0,R0 

PUSH RO 

CALL _waitst 

SUBI 1,SP 
L2: 

LOI @_flag,RO 

CMP] 1,R0 

BNZ L2 

LOI 0,R1 

STI R1,*+FP(1) 
L5: 

LDI *+FP(1),R0 

CMP] 10000,R0 
cies een ~—ap “SR SSR “SEE 

LDI 0,R1 

STI R1,*+FP(2) 
L8: 

LOI *+FP(2),RO 

CMP] 100,R0 

BGE L7 

ADDI RO,FP,ARO 

LDI 10,R1 

STI R1,*+AR0(4) 

ADDI RO, FP,ARO 

LD! -12,R2 

BD L8 

STI R2,*+AR0( 104) 

ADDI 1,R0 

STI RO, *+FP(2) 
uae B L8 ;BRANCH OCCURS 
em, ‘ame - om =—=—_ we ae _— «im 


dsp4.asm - 1 


Li1: 


kat 


0,R0 
RO, *+FP(2) 


*+FP(2),RO 
100,R0 

L10 

RO, FP,ARO 
QCONST+0,R1 
R1,*+AR0(204) 
RO, FP, ARO 
QCONST+1,R2 
304, 1RO 

L11 

R2, *+AROCIRO) 
1,R0 

RO, *+FP(2) 


L11 sBRANCH OCCURS 


= Pe we ee ee oe 


QCONST+2,R7 
-1,R1 
R1,*+FP(3) 
0,R4 

0,R5 


RO, *+FP(2) 


*+FP(2),RO 
100,R0 

L13 

R5S,R4 

RO, FP,ARO 

RO, FP,AR1 
*+AR0(4),RO 
*+AR1(104),R1 
MPY_I 

RO,RS 
*+FP(2),R1 
R1,FP,ARO 
R1,FP,AR1 
*+AR0(204),R6 
304, 1RO 
*+AR1(1RO),R6 
L14 

R6,R7 

1,R1 
R1,*+FP(2) 


orro™ — w assaeo O 


L14 ;BRANCH OCCURS 
—wp oOo _ —/ 


_—_ Pll 


L5 
*+FP(1),R1 
1,R1 
R1,*+FP(1) 


L5 ;BRANCH OCCURS 
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LDI 0,R1 

STI R1,@_flag 
L16: 

B L16 
RAKE EREREKREKEEEEEEKERRERREREKREERREEEEREREERERREEEKREE 
* DEFINE CONSTANTS * 


REEKEREKERREKEEREERREKEREREREREEERRRERREREREREEREREREKREREKEEE 


.bss CONST,3 


-sect ".cinit" 

eword 3,CONST 

-float 1. 30 

float 2. 21 

-float 0. Par 
RHRARARARRAKERARRERERRKEKKKKERRKEKRRERRKREREREKREKEREREEKRREKE 
* UNDEFINED REFERENCES a! 


HAKKAR AKREREEREEKEEEEEREREEKREKKRKKKKKKKKKKKKKKKKKkKKKKE 
-globl _waitst 


-globl MPY_I 
end 


dsp4.asm - 3 


[RARRRERERERREREEERERERREREREREREERERREREERERERRERREREREREEEE / 


/* File: dsp5.c a 


[ REREREREREEEREREKEEREERERRERREREERREREEERRERRERERREREREREREEREE 


main) 
{ 
int i; 
int index; 
int int2; 
register int int1, int3; /* Register R4 for inti, RS for int3 */ 
int array1(100] ,array2(100); 
register float fi, f2; /* Register R6 for f1, R7 for f2 */ 
float farray1[100) , farray2[100); 


waitst(0); 
while (flag != 1); 


for (i=0;1<10000; i++) ¢ 


/* 
for Cindex=0; index<100; index++) { 


arrayl[index] = 10; 
array2[index] = -12; 
> 
et 
asm("" LOI 10, RO"): /* constant 10 -> RO */ 
asm("" LDI -12,R1"); /* constant -12 -> R1 */ 
asm("" LOI FP, ARO"); /* frame pointer -> ARO */ 
asm(" ADDI 4, ARO"); /* Address of array! -> ARO */ 
asm("" LDI FP, AR1"); /* frame pointer -> ARI */ 
asm("" ADDI 104, AR1"')> /* Address of array2 -> AR1 */ 
asm("' RPTS 99"); /* Do the following parallel instruction 100 times */ 
asm("" STI RO, *ARO++"'); /* 10 -> array! Cindex++] */ 
asm("" || STI R1, *AR1++"); /* -12 -> array2[indext++] */ 
/* 


for (index=0; index<100; index++) ¢ 


farray1 [index] = 1.0; 
farray2[index] = 2.0; 
> 
a 
asm(" LOF 1.0, RO"); /* constant 1.0 -> RO */ 
asm("" LDF 2.0,R1"); /* constant 2.0 -> R1 */ 
asm("" LOI FP, ARO"); /* frame pointer -> ARO */ 
asm(" ADDI 204, ARO"); /* Address of farray1 -> ARO */ 
asm(" LOI FP, AR1"); /* frame pointer -> AR1 */ 
asm("" ADDI 304,AR1"); /* Address of farray2 -> AR1 */ 
asm(" RPTS , 99"); /* Do the following parallel instruction 100 times */ 
asm(" STF RO, *ARO++"')> /* 1.0 -> farray1[index++] */ 
asm(" || STF R1, *AR1++"); /* 2.0 -> farray2Lindex++] */ 


dsp5.c - 1 


/* 


for (index=0; index<100; index++) ¢ 
intl += int3; 
int3 = arrayilLindex] * array2[index]; 
f1 = farray1[index] * farray2[index]; 


f2 += f1; 
> 
ad A 
asm(" LDI FP, ARO"); /* frame pointer -> ARO */ 
asm("" ADDI 4, ARO"); /* Address of array! -> ARO */ 
asm(" LOI FP, AR1"); /* frame pointer -> ARO */ 
asm(" ADDI 104,AR1"):; /* Address of array2 -> AR1 */ 
asm("" LDI R4, R3"); /* intl (=0) -> R3 */ 
asm("" LOI R5, Ri"); /* int3 ¢(=0) -> R1 */ 
asm("" RPTS 99"); /* Do the following parallel instruction 100 times */ 
asm("! MPYI *ARO++, *ARI++, RI"): /* arrayl[Lindex++] * array2[index++] -> int! */ 
asm(" || ADDI R1, R3, R3"); /* inti += int3 */ 
asm("" LOI R3, R4"); /* R3 -> inti */ 
asm(" LDI R1, R5""); /* R1 -> ints */ 
/* R4,R5 cannot be used in the above parallel */ 
/* instruction, so R1,R3 are being used */ 
asm(" LOI FP, ARO"): /* frame pointer -> ARO */ 
asm("" ADDI 204, ARO"): /* Address of farrayi -> ARO */ 
asm("" LOI FP, ARI"); /* frame pointer -> AR1 */ 
asm("' ADDI 304,AR1"'); /* Address of farray2 -> AR1 */ 
asm("" LDF R6, RO"): /* £1 (7) -> RO */ 
asm("" LDF R7, Re"); /* f2 (0.0) -> Ri */ 
asm("" MPYF *ARO++, *AR1++, RO")> /* farrayl[index++] * farray2[index++] -> f1*/ 
asm(" RPTS 99"'): /* Do the following parallel instruction 100 times */ 
asm("" MPYF *ARO++, *AR14++, RO"): /* farrayl[index++] * farray2[index++] -> f1 */ 
asm(" || ADDF RO, Re"); /* f2 += fi */ 
asm(" LDF  — RO, R6"); /* RO -> f1 */ 
asm(" LDF R2, R7"); /* R2 -> f2 */ 
/* R6,R7 cannot be used in the above parallel */ 
/* instruction, so RO,R2 are being used */ 
asm("" LOI *+FP(1), RO"): /* 'C’ compiler uses RO to store the index for */ 
/* the outer loop, and assumes that it is not */ 
/* corrupted. But we are corrupting it - so x] 
/* restore back its value from variable’index’ */ 
d 
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RARER EEERHRRERMEKREREKEEKEREEKKKKKRRHUEKEKKKK 


* T™S320C30 C COMPILER Version 1.10 


RREEEKERREKKREEKEKEKEKEKEREREKEKEEREKEKEEEEKREEKEERRRUEEKEEKKKEKE 


FP set AR3 


sect " cinit" 
word 1, flag+0 
-word 2 


-globl flag 
.bss _flag,1 
-globl —_main 
efile  "dsp5.c" 
text 


RRR KKEKEKKEKKERERERRREKEKREKEKEKKERERERKEEEEKKEKERERUER 


* FUNCTION DEF : _main 


REKERRRRRUERKREKEKREREREEEKERERKERREREREKREEKRERERKKKEKRKKRKRKKK 


_main: 
PUSH FP 
LOI SP, FP 
ADDI 403,SP 
PUSH R4 
PUSH R5 
PUSHF R6 
PUSHF R7 
LDI 0,R0 
PUSH RO 
CALL _waitst 
SUBI 1,SP 

LDI 800H, ST 

Le: P 
LOI @ flag,RO 
CMP! 1,R0 
BNZ Le 
LDI 0,R1 
STI R1,*+FP(1) 

L5: 
LDI *+FP(1),RO 
CMP] 10000, RO 
BGE L4 


dsp5.asm - I 


atl ——_—_—a_— —_ — \ 
LOI 10, RO \ 
LDI -12,R1 lol 
LOI FP, ARO G 


ADDI 4, ARO 


LOI FP, AR1 
ADDI 104,AR1 
RPTS 99 
STI RO, *ARO++ 
|| STI R1, *AR1++ 
—™ w= «ae «aD ow of 


a= > 
LOF 1.0, RO 
LDF 2.0,R1 


LOI FP, ARO gc L- 


ADDI 204, ARO 


LDI FP, AR1 
ADDI  304,AR1 
RPTS 99 
STF RO, *ARO++ 
|| STF RN AR, es ah 
a ae OO OEDE QCONST+0,R7 
LDI -1,R1 
STI R1,*+FP(3) 
LDI 0,R4 
=~ %& @ we 2 Bae = os —-_@- 
LDI FP, ARO 
ADDI 4, ARO 
LDI FP, AR1 
ADDI 104,AR1 
LDI R4, R3 
LDI R5, R1 
RPTS 99 
MPYI *ARO++, *ARI++, R1 
|| ADDI R1, R3, R3 
LDI R3, R4 
LDI R1, 25 am wee 


Lor ~ “FP, ARO 

ADDI 204, ARO 

LDI —sFP, ARI 

ADDI 304, AR1 

LDF —s_ R6,_ RO 

LDF —s-R7,_R2 

MPYF  *ARO++, *ARI++, RO 


RPTS 99 
MPYF *ARO++, *AR1++, RO 
|| ADDF RO, R2 


LOF RO, R6 
LOF R2, R7 


EO sae ke —_— on 
BD L5 
ADDI 1, RO 
NOP 
STI RO, *+FP(1) 
4% B LS ;BRANCH OCCURS 
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L4: 


LDI 0,R1 

STI R1,@_flag 
L7: 

B L7 
RRRAARKARUERREKKERRRREKRRRRREERKEREERERERRERRRRERERUKRER 
* DEFINE CONSTANTS . 


HAUERERKRKREERRARURRKRERRERREERRERRRERRRREERRRRREEKEEREKREKEK 


-bss CONST, 1 


sect " cinit" 

.word  1,CONST 

float 0. :0 
RRA RREKRAEKEKEERRERKKKRRKKRKEKKKKKRKKRKKKKRKKRKRKKREKRKE 
* UNDEFINED REFERENCES * 


REEARKKRAEKERREKKRKREREERKRERRRERREEREKREKRREKKKKKEKKKKKKK 


-globl —_waitst 
.end 


dsp5S.asm - 3 


[RRRAERERRREKREREREREREEEREREEREERRERRERERREREEREREEERERREERERERERREEREEHREEHEE 7 


/* fmult.c : C30 program which computes the product of two matrices */ 
/* - avoids memory conflicts by having data in internal memory - */ 
{* */ 
/* Steps involved: =i 
/* 1. Program C30 to ‘zero wait state’ for fast memory access (default */ 
/* is seven wait states). */ 
f* 2. Wait for the flag to be set by PC program (matrix.exe). PC pgm */ 
/* sets this flag after initializing the matrices af{](] and b(](] */ 
/* 3.(a) Copy the input data into internal memory (arr1(][],arr2t]{]) */ 
/* 3.(b) Convert the floating point values from IEEE format to TI */ 
/* format (PC uses IEEE format whereas C30 used TI format) */ 
/* 4. Perform the matrix multiplication of arr1(](] & arr2{]{], and */ 
/* place the result in arr3(]{]. (Use assembly routine for high "/ 
/* performance) i 
/* 5.(a) The resulting matrix arr3{][] values are in TI format. */ 
/* Convert them into IEEE format. dl 
/* 5.¢b) Copy the results (arr3{]{[]) back to external memory (c{][]) */ 
/* 6. Reset the flag to inform PC program that results are ready Ff 
/* */ 
/* Files: *] 
7* The PC source file is fmatrix.c ~/] 
/* The PC executable file is fmatrix.exe af 
/* The C30 source files are "fmult.c" (this file) & "compute.asm" */ 
Ys The C30 executable file is "fmult.out". */ 
fm */ 
/* To create fmult.out using TMS320C30 compiier - */ 
f* c30c fmult (‘c’ file) *7 
/* asm30 compute (C30 assembly file) */ 
f* ink30 fmult.cmd (See fmult.cmd file for details) */ 


[RERREREERRREREREEERERREREEEEREEREREREEREEREEKREREREREREREKEREEEHRREERREREREEE / 


float a4) (3); /* matrix af]] */ 

float b{3] (4); /* matrix bt] (] */ 

float c[4] [4]; /* matrix c(][] */ 

int flag = 0; /* flag - initialize to zero (reset) */ 

main() 

{ 

/* Declare exactly same matrices (a,b,c) with different names. These are */ 
/* local variables, and will be allocated on stack (internal memory). */ 


float arr1(4] [3], arr2(3)(4], arr3(4) (4); 


int i,j; 


/* STEP 1 */ 
waitst(); /* program C30 to zero wait state */ 
/* STEP 2 */ 
while (flag != 1);  /* wait for the flag to be set by PC program - this */ 
/* flag is set by PC program after initializing */ 
/* the matrices a{][] and bf] [] */ 


frult.c - 1 


/* 
/* 


/* 


/* 


/* 


/* 
/* 


/* 


STEP 3.(a) */ 
Replicate data: copy data from external to internal memory */ 


ali) (jj; 
bli) (jl; 


for (i=0;1<4;i++) for (j=0;j<3; j++) arriCi) Cj] 
for (i=0;1<3;i++) for (j=0; j<4; j++) arr2{i) (j) 


STEP 3.(b) */ 

dsp30¢arr1,12); /* convert the floating point values of arri{[][] and */ 
dsp30(arr2,12); /* arr2(i(] from IEEE format to TI format */ 
STEP 4 */ 


compute(arr1,arr2,arr3,4,3,4);/* call assembly routine which computes the */ 
/* product of arri(]{(] & arr2(](], and puts */ 


/* the results in arr3{() (J ad 
STEP 5.(a) */ 
jeee30(arr3, 16); /* convert the floating point values of arr3[] [] sd 
/* from TI format to IEEE format */- 
STEP 5.(b) */ 
copy back the results into matrix c{]{} - external memory */ ; 
for (i=0;1<4;i++) for (j=0;j<4; j++) cli) (j] = arr3 Ci) (jj; 
STEP 6 */ 
flag = 0; /* reset the fiag to inform PC program that the ly | 
/* results are ready */ 


fmult.c - 2 


*/ 


[ERERRRHRRERERRERRERERERERRERRERERRERERRERERHEREEREEREERERERREEREERRRRERRREEE / 
/* FMULT.CMD - v1.10 COMMAND FILE FOR LINKING C30 C PROGRAMS 
j* */ 
/* Usage: ink30 <obj files...> -o <out file> -m <map file> c.cmd */ 
/* */ 
> Description: This file is a sample command file that can be used */ 
/* for linking programs built with the TMS320C30 C */ 
f* Compiler. Use it a guideline; you may want to change */ 
j* the allocation scheme according to the size of your */ 
/* program and the memory layout of your target system. */ 
Vhs */ 
/* Notes: (1) You must specify the directory in which rts.lib is ~y/ 
j* located. Either add a "-i<directory>" line to this */ 
/* file, or use the system environment variable C_DIR to a 
/* specify a search path for libraries. */ 
[* */ 
/* (2) When using the small (default) memory model, be sure =/ 
/* that the ENTIRE .bss section fits within a single page. */ 
77 To satisfy this, .bss must be smaller than 64K words and */ 
/* must not cross any 64K boundaries. */ 
[RRAAERERREREERRARRERERREERRRERRARREREKEEERERRERERERRERRURARERRERERERRERRERERE / 
fmult.obj 
compute.obj 
vectors.obj /* TAKES CARE OF JUMP TO _c_int00 */ 
convert.obj /* IEEE TO DSP CONVERSIONS */ 
dspieee.obj /* CONVERTS ARRAYS OF NUMBERS DSP_IEEE */ 
waitst.obj /* PROGRAMS THE CPU FOR DESIRED WAITSTATES */ 
=C /* LINK USING C CONVENTIONS wtf 
-o fmult.out 
-m fmult.map 
-(rts.lib /* GET RUN-TIME SUPPORT a f 
/* SPECIFY THE SYSTEM MEMORY MAP */ 
MEMORY 
{ 
VECS: org = 0 len = Oxc0 
ROM: org = 0x400 len = 0Ox7b00 
RAMO: org = 0x809800 len = 0x400 /* RAM BLOCK 0 */ 
RAM1: org = 0x809c00 len = 0x400 /* RAM BLOCK 1, PLUS 4K OF EXT */ 
d 
/* SPECIFY THE SECTIONS ALLOCATION INTO MEMORY */ 
/* A SIMPLE MINDED MAP WHICH PUTS EVERYTHING IN THE EXTERNAL MEMORY */ 
SECTIONS _ 
{ 
etext:  {} > ROM /* CODE */ 
-cinit: > > ROW /* INITIALIZATION TABLES */ 
/* SYSTEM STACK */ 
data: ROP /* DATA add 
.bss: {} > ROM /* GLOBAL & STATIC VARS (SEE NOTE 2) */ 
-sysmem: {} > ROM /* DYNAMIC MEMORY - DELETE IF NOT USED */ 


fmult.cmd - 1 


SPIRIT-30 
PARALLEL I/O INTERFACE 


SPIRIT-30 has a 50 pin bi-directional 16 bit parallel I/O interface for high speed 

data acquisition, frame grabber, and memory interface. Using this interface, the 

data transfer can take place directly from an external device to the TMS320C30 
pci jl = aa Typical transfer rates of upto 5 Million 16 bit words/sec can 
e achieved. 


Signal Description: 


The SO pin connector layout and a description Is given in the attached table 
(Table 2.2). The direction of the signals are from the viewpoint of the SPIRIT-30. 
The attached Figure 1.0 shows the P1 connector and IC’s U1, U10, and U18 on 
the SPIRIT-30. U1 and U10 are 74ACT245 trancievers for the data signal. U18 
is the buffer for the port addresses (AO, Al, A2, A3) and control signals. Other 
buffered signals available on the I/O bus are : (Note: ’X’ indicates SPIRIT-30 
parallel I/O bus or TMS320C30’s Expansion bus) 


X1.RESET* (QO) - When low, the device is put in RESET along with the SPIRIT-30 
board. 

X1.XFO (QO) - External flag pin which needs to be formatted as an output from the 
C30. This control signal can be used for synchronization of the external board 
with the SPIRIT-30, triggering of acquisition, interlocked operation for 
multiprocessing configuration, and other control tasks. 

X1.INTI & LACK* (O) - External interrupt 

X1.RDY* (I) - Ready signal from the external board to SPIRIT-30 to indicate that 
the device is ready 

X1.IOSTRB* - I/O strobe to access the external board (indicates valid address) 

X1.R/W* - Read/Write on the parallel I/O port. When this pin is high a Read is 
done by the SPIRIT-30. 

X1.H1, BUF.CLOCK, X1.H3 - BUF.CLOCK is the buffered 8 Mhz clock, H1 and H3 
are 16 Mhz clocks from the DSP. 


Board Interface: 


Using the buffered address, data, and control signals provided on the SPIRIT-30 
parallel I/O, a simple hardware interface needs to be designed for an external 
board. The circuitry needed on the external board is shown in Figure 2.0. The 
two 74HCT245 buffers are not necessary unless you have several external boards. 
Address decoders (74HCT138’s) are needed to decode the lower four addresses 
(A0..A4) of the SPIRIT-30 I/O port. 


Physical Layout: 


The SPIRIT-30 has a 50-pin male IDC connector with two rows of 25 pins. The 
cable length should be kept as short as possible to avoid problems due to 
transmission line effects (max length of 12 inches). 


References: 
TMS320C30 User’s Guide pages 2-3,8-10..8-18, 13-17..13-20). 
SPIRIT-30 Technical Reference Manual, pages 2-5..2-7. 


From : Sonitech Int. Inc., 83 Fullerbrook Rd., Wellesly, MA. 
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External Bus Operation - External Interface Timing 


8.2.2 Expansion Bus I/O Cycles 


In contrast to primary bus and MSTRB cycles, JOSTRB reads and writes are both 
two cycles in duration (with no wait states) and exhibit the same timing. 
During these cycles, address always changes on the falling edge of H1, and 
IOSTRB is low from the rising edge of the first H1 cycle to the rising edge of 
the second H1 cycle. The [OSTRB signa! always goes inactive (high) between 
cycles, and XR/W is high for reads and low for writes. 


Figure 8-8 illustrates read and write cycles when IOSTRB is active and there 
are no wait states. For f[OSTRB accesses, reads and writes require a Minimum 
of two cycles. Some off-chip peripherals may change their status bits when 
read or written. Therefore, it is important that valid addresses be maintained 
when communicating with these peripherals. For reads and writes when 
J\OSTRB is active, [OSTRB is completely framed by the address. 


Figure 8-8. Read and Write for IOSTRB = 0 


External Bus Operation - External Interface Timing 


Figure 8-9 illustrates a read with one wait state when TOSTRB is active, and 
Figure 8-10 illustrates a write with one wait state when IOSTRB is active. For 
each wait state added, IOSTRB, XR/W, and XA are extended one clock cycle. 
Writes hold the data on the bus one additional cycle. The sampling of XRDY 


is repeated each cycle. 
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Figure 8-9. Read with One Wait-State for IOSTRB = 0 
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=e cycle a 


Figure 8-10. Write with One Wait-State for IOSTRB = 0 
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13.3 Expansion Bus Interface 


The TMS320C30s expansion bus interface provides a second complete paral- 
lel bus which can be used to implement data transfers concurrently with, and 
independent of, operations on the primary bus. The expansion bus comprises 
two mutually exclusive interfaces controlied by the MSTRB and IOSTRE sig- 
nals, respectively. These two signals are activated depending on what section 
of the memory space is accessed. This subsection discusses interface to the 
expansion bus using IOSTRB ; MSTRB cycles are identical in timing to primary 
bus cycles, and are discussed in Section 13.2. 


Unlike thre primary bus, both read and write cycles on the |/O portion of the 
expansion bus are two H1 cycies in duration and exhibit the same timing. Thr 

XR/W signal is high for reads and low for writes. Since I/O accesses take two l 
cycles, many peripherals that require wait states if interfaced either to the pri- 
mary bus or using MSTRB may be used in a system without the need for wait 
states. Specifically, any devices with address access times greater than the 
35 ns required by the primary bus but not less than 46 ns can be interfaced 
to the 1/O bus without wait states. 


A/O converters are one common DSP system component which often falls 
into this category. These devices are available in many speed ranges and with 
a variety of features, and while some may require one or more wait states on 
the |/O bus, others may be used at full speed. 


One A/D converter that. interfaces to the |/O bus without wait states and re- 
quires minimal additional logic is the ad 1332 from Analog Devices. Figure 
13-12 illustrates an interface to this device. 

é 
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TMS320C30 


AD1332 


Figure 13-12. Expansion Bus Interface to A/D Converter 


The interface uses a 74ALS138 to decode chip select for the converter. This 
configuration is shown assuming that other peripheral devices in the system 
also require chip select decodes. XA(8-10) are decoded to locate the con- 
verter at address 0804000h, which is the beginning of the !/O address space. 
Other peripherals may also use the outputs of the decoder, which generates 
chip selects in the 1/O address space on 256 word boundaries. 


XAO is used to drive the single address line required in interfacing to the con- 
verter. This input selects between an Anternal 32-word FIFO buffer and the 
A/D’s control/status register. Thus, the FIFO is located at address 0804000h 
and the control/status register is located at address 0804001h. 

4 
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/ 


Since the converter requires RD and WR control signals rather than WE and 
GE, random logic is used to generate these signals from TOSTRB and XR/W. 
The converters TRO (Interrupt Request) output is used to alert the 
TMS320C30 to various conditions of converter status. 


Figure 13-13 shows the timing for read and write operations between the 
TMS320C80 and the AD1332. Both operations are shown on the same tim- 
ing diagram since, unlike the primary bus, only data bus timing and the state 
of XR/W differ between the two different types of cycles. 
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Figure 13-13. Timing of Expansion Bus Interface 


In both cases, address and R/W are valid ti = 10 ns after the falling edge of 
H1. After t2 = 17 ns, the propagation delay of the 74ALS138, the A/D con- 
verter's chip select goes low, selecting the device. Then, t3 = 10 ns after the 
rising edge of H1, IOSTRB goes low, and t4 = 5.8 ns following this, the RD or 
WR signal to the converter goes low, initiating either a read or write cycle, re- 
spectively. 


For a read operation, the A/D converter provides data back to the TMS320C30 
t4 + t5 = 30.8 ns after RD goes low. This satisfies the TMS320C30’'s re- 
quirement of having data valid 35 ns after JOSTRB. For write operations, the 
A/D converter requires less than 5 ns of data setup and hold time with respect 


13-19 


Hardware Applications - Expansion Bus Interface 
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to the rising edge of WR. This is met with a high degree of margin by the 
TMS320C30. 


it should be noted that for the AD1332’s FIFO to be clocked properly, the RDB 
signal must go high between accesses to the device. Therefore, although the 
AD1332 may be fast enough in some cases to be used at speeds approaching 
those of the primary bus, the STRB signal on the primary bus stays low for 
multiple consecutive read cycles. The I/O bus, therefore, is the preferable 
choice for interface to this device. 
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